International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |


TITLE Optimization of Machine Learning Models for Big Data Applications
ABSTRACT We present a comprehensive analysis of scalable optimization techniques for machine learning (ML) on big data. We review recent (last 5 years) research on optimization algorithms – including stochastic first-order methods (SGD and momentum), adaptive methods (AdaGrad/RMSProp, Adam and variants, LARS/LAMB), second-order and quasi-Newton techniques (Newton, Hessian-free, K-FAC, Shampoo), and variance-reduction methods (SVRG, SAGA, etc.) – and discuss their convergence and practical trade-offs[1][2]. We also cover distributed and federated optimization algorithms: data-, model-, and pipeline-parallel training; asynchrony (e.g. Hogwild!); communication-efficient methods (gradient quantization, sparsification)[3][4]; and federated averaging (FedAvg) vs. distributed synchronous SGD[5][6]. We identify benchmark “big data” datasets across vision (ImageNet: 14M images[7]), text (English Wikipedia: 3.9M articles, 2.24B tokens[8]; Common Crawl: billions of pages, ~400TiB/month[9]), recommendation (Criteo CTR: 45M to billions of examples[10]), and graphs (OGB-LSC: MAG240M with 244M nodes, 1.7B edges[11]). We detail scalable frameworks (TensorFlow, PyTorch DDP, Horovod, Ray, Spark MLlib, etc.), summarizing their parallelism models, strengths, and limitations[12][13]. We propose experimental designs comparing optimizers on large-scale tasks, with metrics (accuracy, throughput, time-to-accuracy[6], communication cost), hardware setup (GPU/TPU clusters), and statistical analysis plans. Identified research gaps include integrating adaptive and second-order techniques at scale, better asynchronous/federated algorithms, and hyperparameter search in distributed settings. We outline a potential paper structure, suggest high-impact ML/BigData journals (e.g., IEEE Trans. Big Data, TKDE, ICML/NeurIPS(MLSys track)), and list relevant research groups for peer review. Tables compare optimization algorithms, datasets, and frameworks; a Gantt chart (Mermaid) sketches a project timeline.
AUTHOR PRAJWAL M DIVATAGI, PRATEEK S KUDARI, MARUTHI B PUJAR, ARCHANA K N UG Students, Dept. of CSE, Jain Institute of Technology, Davangere, Karnataka, India Assistant Professor, Dept. of CSE, Jain Institute of Technology, Davangere, Karnataka, India
VOLUME 183
DOI DOI: 10.15680/IJIRCCE.2026.1404118
PDF pdf/118_Optimization of Machine Learning Models for Big Data Applications.pdf
KEYWORDS
References 1. (SGD/Adaptive: Bottou (SGD), Duchi et al. JMLR 2011 (AdaGrad), Kingma & Ba ICLR 2015 (Adam)[1], Dozat (2016 NIPS; NAdam)[15].
2. Large-Batch: Goyal et al. (2017) on 256-GPU ResNet (LARS)[2], You et al. (2019 ICLR) LAMB[20].
3. Second-Order: Martens & Grosse (ICML 2015, K-FAC)[21], Gupta et al. (ICLR 2018, Shampoo)[22].
4. Variance Reduction: Johnson & Zhang (NIPS 2013, SVRG).
5. Distributed Training: Dean et al. (NIPS 2012, DistBelief), Niu et al. (NIPS 2011, Hogwild!), Sergeev & Balso (2018, Horovod)[13].
6. Communication: Alistarh et al. (ICML 2017, QSGD)[3][4].
7. Hyperparameter: Li et al. (ICLR 2018, Hyperband)[29], Jaderberg et al. (NeurIPS 2017, PBT)[27][28].
8. Frameworks: Abadi et al. (2016, TensorFlow)[12], Paszke et al. (2019, PyTorch), Sergeev & Balso (2018, Horovod)[13], Moritz et al. (OSDI 2018, Ray)[32].
9. Benchmarks: Deng et al. (2009, ImageNet paper); Common Crawl docs; Tullie Murrell 2025 (overview of Criteo)[10]; Hu et al. (NeurIPS 2020, OGB paper)[11].
10. Time-to-Accuracy: Coleman et al. (arXiv 2019, DAWNBENCH)[6][26].
11. [1] [2] [3] [4] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] Large-Scale Deep Learning Optimizations: A Comprehensive Surveyhttps://arxiv.org/pdf/2111.00856
12. [5] [12] [13] [25] [30] [33] [34] [35] The Landscape of Modern Machine Learning: A Review of Machine, Distributed and Federated Learninghttps://arxiv.org/html/2312.03120v1
13. [6] [26] [1806.01427] Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark
https://arxiv.org/abs/1806.01427
14. [7] ImageNet – Wikipedia https://en.wikipedia.org/wiki/ImageNet
15. [8] Experiments on the English Wikipedia — gensimhttps://radimrehurek.com/gensim/wiki.html
16. [9] Inside Common Crawl: The Dataset Behind AI Models (and Its Real World Limits) - DEV Community
https://dev.to/extractdata/inside-common-crawl-the-dataset-behind-ai-models-and-its-real-world-limits-2eo2
17. [10] Criteo Dataset: Tackling Large-Scale Click-Through Rate Prediction | Shaped
https://www.shaped.ai/blog/criteo-dataset-tackling-large-scale-click-through-rate-prediction
18. [11] Overview of OGB-LSC | Open Graph Benchmark https://ogb.stanford.edu/docs/lsc/
19. [24] Stochastic variance reduction - Wikipediahttps://en.wikipedia.org/wiki/Stochastic_variance_reduction
20. [27] [28] [1711.09846] Population Based Training of Neural Networks https://arxiv.org/abs/1711.09846
21. [29] vldb.org https://www.vldb.org/pvldb/vol15/p1256-li.pdf
22. [31] [32] [36] [37] [1712.05889] Ray: A Distributed Framework for Emerging AI Applications
https://ar5iv.labs.arxiv.org/html/1712.05889
image
Copyright © IJIRCCE 2020.All right reserved